Automatic Natural Language Style Classification and Transformation

نویسندگان

  • Foaad Khosmood
  • Robert A. Levinson
چکیده

Style is an integral part of natural language in written, spoken or machine generated forms. Humans have been dealing with style in language since the beginnings of language itself, but computers and machine processes have only recently begun to process natural language styles. Automatic processing of styles poses two interrelated challenges: classification and transformation. There have been recent advances in corpus classification, automatic clustering and authorship attribution along many dimensions but little work directly related to writing styles directly and even less in transformation. In this paper we examine relevant literature to define and operationalize a notion of “style” which we employ to designate style markers usable in classification machines. A measurable reading of these markers also helps guide style transformation algorithms. We demonstrate the concept by showing a detectable stylistic shift in a sample piece of text relative to a target corpus. We present ongoing work in building a comprehensive style recognition and transformation system and discuss our results.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Automatic Transcription of Lecture Speech using Language Model Based on Speaking-Style Transformation of Proceeding Texts

For language modeling of spontaneous speech recognition, we propose a style transformation approach, which transforms written texts to a spoken-style language model. Since these two styles are largely different and thus direct transformation is difficult, we cascade two transformation methods; rule-based transformation to rewrite written-style texts to intermediate “verbatim” texts, and statist...

متن کامل

Transformation-Based Learning for Automatic Translation from HTML to XML

Format tags implicitly represent content information in the same ambiguous, context dependent manner that words represent semantics in natural language. Translation from format to content markup shares many characteristics with tagging and parsing tasks in computational linguistics. The transformation-based learning (TBL) paradigm has recently been applied to numerous computational linguistics ...

متن کامل

LEXICALIZING COMPUTATIONAL STYLISTICS For Language Learner Feedback

Computational stylistics refers informally to a collection of tasks within computational linguistics that deal with the style—as opposed to the semantic content—of natural language. The most famous of these tasks is perhaps authorship attribution (Stamatatos et al., 2001), which uses statistical variations in word choice to select the most likely from a fixed set of potential authors. Though ap...

متن کامل

Verb Clustering for Brazilian Portuguese

Levin-style classes which capture the shared syntax and semantics of verbs have proven useful for many Natural Language Processing (NLP) tasks and applications. However, lexical resources which provide information about such classes are only available for a handful of worlds languages. Because manual development of such resources is extremely time consuming and cannot reliably capture domain va...

متن کامل

Mercure: Towards an Automatic E-mail Follow-up System

This paper discusses the design and the approach we have developed in order to deal effectively with customer e-mails sent to a corporation. We first present the current state of the art and then make the point that natural language tools are needed in order to deal effectively with the rather informal style encountered in the e-mails. In our project, called Mercure, we have explored three comp...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009